-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix fastavro version used in Feast to avoid Timestamp delta error #490
Fix fastavro version used in Feast to avoid Timestamp delta error #490
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidheryanto The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
31db636
to
4958254
Compare
@davidheryanto This sounds like it could be a bug in fastavro caused by some new changes. As the maintainer of that project I'd like to see if we can fix that. Do you think you could provide some details on the problem? Thanks! |
Hi @scottbelden, thanks for checking. Actually I'm not exactly sure what is the root cause, but I know this Pandas cython module is where the exception is thrown To reproduce the error, assuming you have Docker installed: docker run --rm -it continuumio/miniconda3:4.7.12 bash
# Once in the shell in the container
# pandavro depends on fastavro
pip install pandavro==1.5
python <<EOF
from pandavro import to_avro
from datetime import datetime
import pytz
import pandas as pd
df = pd.DataFrame({"datetime": [datetime.utcnow().replace(tzinfo=pytz.utc)]})
to_avro("/tmp/out.avro", df)
EOF Will throw this error with stacktrace Traceback (most recent call last):
File "<stdin>", line 7, in <module>
File "/opt/conda/lib/python3.7/site-packages/pandavro/__init__.py", line 151, in to_avro
records=df.to_dict('records'), codec=codec)
File "/opt/conda/lib/python3.7/site-packages/fastavro/_write_py.py", line 579, in writer
output.write(record)
File "/opt/conda/lib/python3.7/site-packages/fastavro/_write_py.py", line 421, in write
write_data(self.io, record, self.schema)
File "/opt/conda/lib/python3.7/site-packages/fastavro/_write_py.py", line 234, in write_data
return fn(encoder, datum, schema)
File "/opt/conda/lib/python3.7/site-packages/fastavro/_write_py.py", line 189, in write_record
name, field.get('default')), field['type'])
File "/opt/conda/lib/python3.7/site-packages/fastavro/_write_py.py", line 234, in write_data
return fn(encoder, datum, schema)
File "/opt/conda/lib/python3.7/site-packages/fastavro/_write_py.py", line 154, in write_union
if validate(datum, candidate, raise_errors=False):
File "/opt/conda/lib/python3.7/site-packages/fastavro/_validation_py.py", line 369, in validate
datum = prepare(datum, schema)
File "/opt/conda/lib/python3.7/site-packages/fastavro/_logical_writers_py.py", line 44, in prepare_timestamp_micros
delta = (data - epoch)
File "pandas/_libs/tslibs/c_timestamp.pyx", line 296, in pandas._libs.tslibs.c_timestamp._Timestamp.__sub__
TypeError: Timestamp subtraction must have the same timezones or no timezones If i downgrade, to Strangely when I do not use Python from miniconda, say I use the Python official Docker image, it's ok. The timestamp subtraction error is not there. docker run --rm -it python:3.7 bash
pip install pandavro==1.5
python <<EOF
from pandavro import to_avro
from datetime import datetime
import pytz
import pandas as pd
df = pd.DataFrame({"datetime": [datetime.utcnow().replace(tzinfo=pytz.utc)]})
to_avro("/tmp/out.avro", df)
EOF I attach the output of pip list in these 2 occasions although I don't notice anything wrong. From the miniconda that throws error
From the official Python Docker image that does not throw error
|
/lgtm |
@davidheryanto The difference you are seeing is a slight difference in our python vs. cython implementation. However, it seems like the source of the error is more so because we create our own version of a UTC tzinfo. I have a PR that uses |
Thanks @scottbelden for the update. |
…ast-dev#490) * Fix fastavro version used in feast to 0.22.9 * Print python packages version used when e2e test fails (cherry picked from commit 7586da6)
* Fix fastavro version used in feast to 0.22.9 * Print python packages version used when e2e test fails (cherry picked from commit 7586da6)
@davidheryanto I published version |
Thanks @scottbelden |
What this PR does / why we need it:
This fixes end to end batch test
#488 (comment)
http://prow.feast.ai/view/gcs/feast-templocation-kf-feast/pr-logs/pull/gojek_feast/488/test-end-to-end-batch/1231860946993942532
Which issue(s) this PR fixes:
Does this PR introduce a user-facing change?:
This PR also added debugging info of all Python packages version used during end to end test, when the test fails, to aid debugging failed tests.
NOTE
Probably Feast Python SDK should be updated so it can work with
fastavro
version0.22.10
and newer rather than fixing it to a specific version of fastavro?